104 research outputs found

    LMethyR-SVM: Predict Human Enhancers Using Low Methylated Regions based on Weighted Support Vector Machines

    No full text
    <div><p>Background</p><p>The identification of enhancers is a challenging task. Various types of epigenetic information including histone modification have been utilized in the construction of enhancer prediction models based on a diverse panel of machine learning schemes. However, DNA methylation profiles generated from the whole genome bisulfite sequencing (WGBS) have not been fully explored for their potential in enhancer prediction despite the fact that low methylated regions (LMRs) have been implied to be distal active regulatory regions.</p><p>Method</p><p>In this work, we propose a prediction framework, LMethyR-SVM, using LMRs identified from cell-type-specific WGBS DNA methylation profiles and a weighted support vector machine learning framework. In LMethyR-SVM, the set of cell-type-specific LMRs is further divided into three sets: reliable positive, like positive and likely negative, according to their resemblance to a small set of experimentally validated enhancers in the VISTA database based on an estimated non-parametric density distribution. Then, the prediction model is obtained by solving a weighted support vector machine.</p><p>Results</p><p>We demonstrate the performance of LMethyR-SVM by using the WGBS DNA methylation profiles derived from the human embryonic stem cell type (H1) and the fetal lung fibroblast cell type (IMR90). The predicted enhancers are highly conserved with a reasonable validation rate based on a set of commonly used positive markers including transcription factors, p300 binding and DNase-I hypersensitive sites. In addition, we show evidence that the large fraction of the LMethyR-SVM predicted enhancers are not predicted by ChromHMM in H1 cell type and they are more enriched for the FANTOM5 enhancers.</p><p>Conclusion</p><p>Our work suggests that low methylated regions detected from the WGBS data are useful as complementary resources to histone modification marks in developing models for the prediction of cell-type-specific enhancers.</p></div

    The distributions of the LMRs and predicted enhancer windows.

    No full text
    <p>(a): The distributions of the LMRs to the nearest TSSs; red for H1 and green for IMR90. (b): The distribution of all the enhancer windows to their nearest TSSs; blue for H1 and brown for IMR90. The distance was measured from the center point of a sequence to its nearest TSS.</p

    The comparison of the proportions of overlap between the enhancer windows predicted by MethyR-SVM in H1 and ChromHMM annotated enhancers in 9 cell types.

    No full text
    <p>The comparison of the proportions of overlap between the enhancer windows predicted by MethyR-SVM in H1 and ChromHMM annotated enhancers in 9 cell types.</p

    Comparison of the conservation levels for the predicted enhancers.

    No full text
    <p>Proportions of overlaps between the predicted enhancers from each method with the conserved segments by the UCSC <i>PhastCons46Ways</i> conservation annotation at vertebrate level. Each enhancer is represented by its midpoint (1bp); (a) for H1 and (b) for IMR90.</p

    The summary of the predicted enhancers obtained from LMethyR-SVM.

    No full text
    <p>The summary of the predicted enhancers obtained from LMethyR-SVM.</p

    The summary of the overlap between the predicted enhancers and the FANTOM5 enhancers.

    No full text
    <p>The summary of the overlap between the predicted enhancers and the FANTOM5 enhancers.</p

    Results of comparison with other enhancer prediction models.

    No full text
    <p>(a) for H1 and (b) for IMR90. “Validation” rates were computed as percentages of overlaps with either DHSs, p300 sites or enhancer-associated transcription factor binding sites (NANOG, CEBPB and TEAD4 for H1 and CEBPB for IMR90); “Misclassification” rates were computed as percentages of overlaps with the UCSC annotated TSSs. “Validated” enhancers can be further divided into one of the mutually exclusive categories: “p300+/-DHS”, “DHS only”, “TF+DHS”, “TF only”, “TF+P300”, “p300+DHS+TF”. For LMethyR-SVM, the highest-scored enhancer windows were used. The total numbers of the enhancers in H1 predicted from the individual methods are 17,828 (ChromHMM Strong), 217,350 (ChromHMM weak), 54,121 (RFECS), 37,263 (EnhancerFinder), 34,437 (LMethyR-SVM) and 34,437 (Random). The total numbers of the enhancers in IMR90 predicted from the individual methods are 82,392 (RFECS), 35,203 (LMethyR-SVM) and 35,203 (Random).</p

    Clustering Pseudomonad genomes using enzyme function counts.

    No full text
    <p>The “Primer 6” core package and enzyme function profile data were used to generate hierarchical clusters. No obvious pattern by species or by ecological function is apparent using only enzyme function count and hierarchical clustering. Suggesting additional data and/or alternate methods are required to deduce Pseudomonad environmental niche using sequenced and annotated genomes.</p

    Venn diagram for significant features identified by SVM for each model feature type and for each ecological niche.

    No full text
    <p>All values in diagram are presented as percent of features out of total number of high-weight features for SVM feature type.</p

    Assigned Ecological Niche Classifications of Pseudomonad Species.

    No full text
    <p>Assigned Ecological Niche Classifications of Pseudomonad Species.</p
    • …
    corecore